A Probabilistic Model for Semantic Word Vectors
نویسندگان
چکیده
Vector representations of words capture relationships in words’ functions and meanings. Many existing techniques for inducing such representations from data use a pipeline of hand-coded processing techniques. Neural language models offer principled techniques to learn word vectors using a probabilistic modeling approach. However, learning word vectors via language modeling produces representations with a syntactic focus, where word similarity is based upon how words are used in sentences. In this work we wish to learn word representations to encode word meaning – semantics. We introduce a model which learns semantically focused word vectors using a probabilistic model of documents. We evaluate the model’s word vectors in two tasks of sentiment analysis.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملSpherical Paragraph Model
Representing texts as fixed-length vectors is central to many language processing tasks. Most traditional methods build text representations based on the simple Bag-of-Words (BoW) representation, which loses the rich semantic relations between words. Recent advances in natural language processing have shown that semantically meaningful representations of words can be efficiently acquired by dis...
متن کاملProbabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domaingeneral lexical disambiguation tasks. We introduce Distributional Semantic Hidden Markov Models, a novel variant ...
متن کاملWord Semantic Representations using Bayesian Probabilistic Tensor Factorization
Many forms of word relatedness have been developed, providing different perspectives on word similarity. We introduce a Bayesian probabilistic tensor factorization model for synthesizing a single word vector representation and per-perspective linear transformations from any number of word similarity matrices. The resulting word vectors, when combined with the per-perspective linear transformati...
متن کاملProbabilistic Semantic Analysis of Speech
This paper presents a new probabilistic approach to semantic analysis of speech. The problem of nding the semantic contents of a word chain is modeled as the problem of assigning semantic attributes to words. The discrete assignment function is characterized by random vectors and its probabilities. By computing the best of all possible statistically modeled assignments, we get the semantic cont...
متن کامل